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Abstract 

It  has  recently  been  proved  that  the  popular  nonlocal  means  (NLM)  denoising 
algorithm  does  not  optimally  denoise  images  with  sharp  edges.  Its  weakness 
lies  in  the  isotropic  nature  of  the  neighborhoods  it  uses  to  set  its  smoothing 
weights.  In  response,  in  this  paper  we  introduce  several  theoretical  and 
practical  anisotropic  nonlocal  means  (ANLM)  algorithms  and  prove  that  they 
are  near  minimax  optimal  for  edge-dominated  images  from  the  Horizon  class. 
On  real-world  test  images,  an  ANLM  algorithm  that  adapts  to  the  underlying 
image  gradients  outperforms  NLM  by  a  significant  margin. 
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1.  Introduction 

Image  denoising  is  a  fundamental  primitive  in  image  processing  and  com¬ 
puter  vision.  Denoising  algorithms  have  evolved  from  the  classical  linear 
and  median  filters  to  more  modern  schemes  like  total  variation  denoising  [I], 
wavelet  thresholding  [2],  and  bilateral  filters  [3  6]. 

A  particularly  successful  denoising  scheme  is  the  nonlocal  means  (NLM) 
algorithm  [7] ,  which  estimates  each  pixel  value  as  a  weighted  average  of  other, 
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Figure  1:  An  example  of  a  Horizon  class  image  that  features  a  smooth  edge 
contour  that  separates  the  white  region  from  the  black  region. 


similar  noisy  pixels.  However,  instead  of  using  spatial  adjacency  or  noisy  pixel 
value  as  the  similarity  measure  to  adjust  the  estimate  weights,  NLM  uses  a 
more  reliable  notion  of  similarity  based  on  the  resemblance  of  the  pixels’ 
neighborhoods  in  high-dimensional  space.  This  unique  feature  benefits  NLM 
in  two  ways.  First,  it  provides  more  accurate  weight  estimates.  Second, 
it  enables  NLM  to  exploit  the  contribution  of  all  pixels  in  the  image.  In 
concert,  these  features  enable  NLM  to  provide  state-of-the-art  performance 
for  a  large  class  of  image  denoising  problems. 

Nevertheless,  in  a  recent  paper,  we  have  proved  that  NLM  does  not  attain 
optimal  performance  on  images  with  sharp  edges  from  the  so-called  Horizon 
class  (see  Figure  1)  [8].  Indeed,  NLM’s  theoretical  performance  is  more  or 
less  equivalent  to  wavelet  thresholding,  which  was  shown  to  be  suboptimal 
in  [9].  The  core  problem  is  that  NLM  (and  wavelet  thresholding)  cannot 
exploit  the  smoothness  of  the  edge  contour  that  separates  the  white  and 
black  regions.  Therefore,  there  is  room  for  improvement. 

In  this  paper,  we  introduce  and  study  a  new  denoising  framework  and 
prove  that  it  is  near-optimal  for  Horizon  class  images  with  sharp  edges. 
Anisotropic  nonlocal  means  (ANLM)  outperforms  NLM,  wavelet  threshold¬ 
ing,  and  more  classical  techniques  by  using  anisotropic  neighborhoods  that 
are  elongated  along  and  matched  to  the  local  edge  orientation.  Figure  2  com¬ 
pares  the  neighborhoods  used  in  ANLM  with  those  used  in  NLM.  Anisotropic 


2 


Figure  2:  Comparison  of  (left)  the  isotropic  neighborhoods  employed  by  Non 
Local  Means  (NLM)  versus  (right)  anisotropic  neighborhoods  employed  by 
Anisotropic  NLM  (ANLM). 


neighborhoods  enable  ANLM  to  distinguish  between  similar  and  dissimilar 
pixels  more  accurately. 

We  develop  three  different  ANLM  algorithms  of  increasing  levels  of  practi¬ 
cality.  Oracle  Anisotropic  Nonlocal  Means  (OANLM)  assumes  perfect  knowl¬ 
edge  of  the  local  orientation  of  the  edge  contour  and  is  used  primarily  for  our 
theoretical  optimality  analysis.  Discrete-angle  Anisotropic  Nonlocal  Means 
(DANLM)  optimizes  the  choice  of  the  anisotropic  neighborhood  around  each 
pixel  in  order  to  achieve  near-optimal  performance  without  any  oracle  in¬ 
formation.  Since  it  is  more  computationally  demanding  than  NLM,  we  in¬ 
troduce  an  algorithmic  simplification.  Gradient  based  Anisotropic  Nonlo¬ 
cal  Means  (GANLM)  uses  image  gradient  information  to  estimate  the  edge 
orientation;  using  simulations,  we  demonstrate  that  GANLM  significantly 
outperforms  NLM  in  practice  on  both  Horizon  class  and  real-world  images. 

The  paper  is  organized  as  follows.  Section  2  explains  the  minimax  frame¬ 
work  we  use  to  analyze  the  denoising  algorithms  and  reviews  the  necessary 
background.  Section  3  introduces  the  OANLM  and  DANLM  algorithms  and 
presents  the  main  theorems.  Section  4  addresses  some  of  the  practical  ANLM 
issues  by  introducing  GANLM  and  summarizes  the  results  of  a  range  of  simu¬ 
lations  using  synthetic  and  real-world  imagery.  Section  5  contains  the  proofs 
of  the  main  theorems.  Section  6  reviews  the  related  work  in  the  literature. 
Section  7  closes  with  a  discussion  of  our  current  and  potential  future  results. 
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2.  Minimax  analysis  framework 


In  this  section  we  introduce  the  minimax  framework  [2,  10]  and  the  Hori¬ 
zon  class  image  model  considered  in  this  paper.  Note  that,  in  order  to  stream¬ 
line  the  proofs,  we  take  a  continuous- variable  analysis  approach  in  this  paper, 
in  contrast  to  our  approach  in  [8].  The  moral  of  the  story  is  compatible  with 
[8],  however. 

2.1.  Risk 

We  are  interested  in  estimating  an  image  described  by  the  function  /  : 
[0,  l]2  — »■  [0,1]  (/  G  L2([0,  l]2))  from  its  noisy  observation 

dY (£i,  t2)  =  f{ti,t2)dtidt2  +  ad\V(ti,  t2).  (1) 

Without  loss  of  generality,  we  consider  only  square  images.  Here  W(t\,t2) 
is  the  Wiener  sheet,2  and  a  is  a  constant  that  scales  the  noise.  For  a  given 
function  /  and  a  given  estimator  /,  define  the  risk  as 

*(/,/)  =E(||/-/|g), 

where  the  expected  value  is  over  W.  The  risk  can  be  decomposed  into  bias 
squared  and  variance  terms 

R{f,  f)  =  11/  -  E/||!  +  E||/  -  E/|||. 

Let  /  belong  to  a  class  of  functions  T .  The  risk  of  the  estimator  /  on  T 
is  defined  as  the  risk  of  the  least- favorable  function,  i.e. , 

R{F >  /)  =  sup  R(f,  /). 
feT 

The  minimax  risk  over  the  class  of  functions  T  is  then  defined  as 

R*(T)  =  inf  sup  R(f,  /). 

/  fer 

R*  (  J-  )  is  a  lower  bound  for  the  performance  of  any  estimator  on  T . 

In  this  paper,  we  are  interested  in  the  asymptotic  setting  as  a  0.  For 
all  the  estimators  we  consider,  R{ T.  f)  — >  0  as  a  0.  Therefore,  following 
[11,  12]  we  will  consider  the  decay  rate  of  the  minimax  risk  as  our  measure 
of  performance. 


2The  Wiener  sheet  is  the  primitive  of  white  noise. 


4 


2.2.  Horizon  edge  model 

In  our  analysis,  we  consider  the  Horizon  model  that  contains  piecewise 
constant  images  with  sharp  step  edges  lying  along  a  smooth  contour  [9,  11,  13] 
(our  analysis  extends  easily  to  piecewise  smooth  edges).  Let  H  older0  (C) 
be  the  class  of  Holder  functions  on  1R,  defined  in  the  following  way:  h  £ 
Holder0 (C)  if  and  only  if 

\h(k)(t!)  -  h(k)(t[)\  <  C\h  -  t\\a~k , 

where  k  —  |_<aj .  Consider  a  transformation  that  maps  each  one-dimensional 
edge  contour  function  h  to  a  two-dimensional  image  //,  :  [0,  l]2  — >  R.  via 

fhihyh)  =  l{t2<h(*l)}- 

The  Horizon  class  of  images  is  then  defined  as 

H°(C)  =  {fh(ti,t2)  :  h  £  Holder0 (C)  D  H older1  { 1)},  (2) 

where  a  is  the  smoothness  of  the  edge  contour.  Figure  1  illustrates  a  sample 
function  of  this  class.  The  following  theorem  characterizes  the  minimax  rate 
of  H°{C)  [9,  11,  13]. 3 


Theorem  1.  [9,  11.  13]  For  a  >  1,  the  minimax  risk  of  the  class  H°(C)  is 

R*{H°(C))  =  . 

Achieving  the  minimax  rate  is  a  laudable  goal  that  any  well-respecting 
denoising  algorithm  should  aspire  to.  In  this  paper,  we  will  focus  primarily 
on  a  =  2  edge  contours,  for  which  the  optimal  minimax  decay  rate  is  <7-4/3. 
However,  it  is  straightforward  to  draw  similar  conclusions  for  the  other  values 
of  a. 


’The  models  considered  in  [9,  11,  13]  are  slightly  different  from  the  continuous  frame¬ 
work  of  this  paper.  Therefore,  for  the  sake  of  completeness  we  prove  Theorem  1  in  Ap¬ 
pendix  C. 
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Table  1:  Minimax  risk  decay  rates  of  several  classical  image  denoising  algo¬ 
rithms;  recall  that  the  optimal  minimax  rate  is  <r4/3. 


Algorithm 

Minimax  Rate 

Mean  filter  14] 

a2'* 

Wavelet  thresholding  [9,  11 

a1 

Nonlocal  means  8] 

a4 

2.3.  Image  denoising  algorithms 

Minimax  risk  analysis  of  the  classical  denoising  algorithms  has  revealed 
their  suboptimal  performance  on  images  with  sharp  edges.  Table  1  summa¬ 
rizes  several  of  these  algorithms  and  their  minimax  decay  rates  (up  to  a  log 
factor).4  The  suboptimality  of  wavelet  thresholding  has  led  to  the  develop¬ 
ment  of  ridgelets  [15],  curvelets  [16],  wedgelets  [9],  platelets  [17],  shear  lets 
[18],  contourlets  [19],  bandelets  [20],  directionlets  [21]  and  other  types  of 
directional  transforms.  See  [22]  and  the  references  therein  for  more  informa¬ 
tion.  In  the  framework  of  this  paper,  wedgelet  denoising  [9]  provably  achieves 
the  optimal  minimax  rate.  However,  it  performs  poorly  on  textures,  which 
has  limited  its  application  in  image  processing. 

2-4-  Nonlocal  means  denoising 

In  2005,  Buades,  Coll,  and  Morel  significantly  improved  the  performance 
of  the  bilateral  filter  [8]  by  incorporating  a  new  notion  of  pixel  similarity. 
As  in  the  bilateral  filter,  NLM  estimates  each  pixel  value  using  a  weighted 
average  of  other  pixel  values  in  the  image.  However,  NLM  sets  the  weights 
according  to  the  similarity  between  the  pixel  neighborhoods  rather  than  the 
pixel  values.  Furthermore,  in  contrast  to  the  bilateral  filter,  in  which  only 
the  vicinity  of  each  pixel  contributes  to  the  estimate,  in  NLM  all  pixels  may 
contribute. 


4The  analysis  framework  used  in  [14]  and  [8]  is  discrete  rather  than  continuous.  There¬ 
fore,  for  the  sake  of  completeness  we  establish  the  results  for  the  mean  filter  and  NLM  in 
Appendix  A  and  Appendix  B,  respectively. 


6 


Specifically,  the  NLM  algorithm  is  defined  as  follows.  Let  S  —  [0,  l]2 
represent  the  domain  of  the  image.  Define  the  ^-neighborhood  of  a  (t\,t2)  as 


h(t  1,  t2) 


s 

S' 

8 

<51 

h  -  2 

*1  +  2. 

x 

ti  ~ 

~  2 

**+2\ 

(3) 


Following  the  definition  of  NLM  in  the  discrete  setting,  we  pixelate  this 
neighborhood  by  partitioning  Is{t\,t2)  into  n 2  subregions  Ip ,22  =  —  h  + 

x  [^2  —  ^2n'  ^2  T  ^].  The  pixelated  neighborhood  of  (t\,t2)  is  defined  as 
y \  t2  e  Rnxn  and  satisfies 


yti.taC7i.j2)  =  [  dY(si,  s2). 


(si,s2)eiJs 

We  further  define  the  pixelated  process  as 

,2 


7l  f 

X(ti,  t2)  —  —  /  dY(si,s2). 

"  J(si>s2)e/a/„(ti,t2) 

Define  the  ^-neighborhood  distance  between  two  points  in  the  image  as 


dj(dY(ti,t2),  dY(si,  s2)) 


n 2  - 


j  (llyfltta  -  yfi.«ll2  -  lyti,t2(°.°)  -  yf1)S2(M)l2) 


(4) 


where  ||y^  1||  =  t2 (i,  j)-  Note  that  in  contrast  to  the  definition  in 

[7],  we  have  removed  the  center  element  |y£  t2 (0, 0)  —  yf:  S2(0,0)|2  from  the 
summation.  Since  we  assume  that  n2  — >  oo  as  a  — >  0,  the  effect  is  negli¬ 
gible  on  the  asymptotic  performance.  But,  as  we  will  see  below,  removing 
the  center  element  simplifies  the  calculations  considerably.  NLM  uses  the 
neighborhood  distances  to  estimate 

iNu  4 

.1  (Ll  ,  t2) - f - jy - 7 - — - J - , 

J(Sl,S2)es<ut2^u^)d,s2ds2 

where  w^t2(si,  s2)  is  set  according  to  the  ^-neighborhood  distance  between 
d,Y (ti,  t2)  and  dY (sl5  s2)-5  It  is  straightforward  to  verify  that  E dj(dY (t\,  t2), 


5 We  assume  that  both  t2(si,  s2)X(si,  s2)  and  t2{si,  s2)  are  Lebesgue  integrable 
with  high  probability. 
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d,Y(si,s2))  —  dj(f(t1,t2),f(s1,s2))  +  ^f~,  which  suggests  the  following 
strategy  for  setting  the  weights: 


wh,t2(si’  ss) 


1  Hd^dYit^.dY^s,))  <  ^  +  ra, 
0  otherwise, 


(5) 


where  t„  is  the  threshold  parameter.  Soft/tapered  weights  have  been  explored 
and  are  often  used  in  practice  [7].  However,  the  above  untapered  weights 
capture  the  essence  of  the  algorithm  while  simplifying  the  analysis. 

The  distinguishing  feature  of  NLM  the  weighted  averaging  of  pixels 
based  on  the  neighborhoods  produces  a  decay  rate  that  is  superior  to  that 
linear  filters  as  shown  in  [8].  However,  compared  to  the  optimal  rate  of  a4/3, 
NLM  remains  suboptimal.  We  introduce  the  anisotropic  NLM  algorithm  in 
the  next  section  to  address  this  gap  in  performance. 


3.  Anisotropic  nonlocal  means  denoising 

In  this  section,  we  introduce  the  Anisotropic  Nonlocal  Means  (ANLM)  al¬ 
gorithm  concept  that  exploits  the  smoothness  of  edge  contours  via  anisotropic 
neighborhoods.  We  then  present  OANLM  and  DANLM  algorithms  and  ex¬ 
plain  their  performance  guarantees. 


3.1.  Directional  neighborhoods 

We  now  formally  introduce  the  notion  of  directional  neighborhoods.  Then 
we  extend  NLM  to  exploit  such  neighborhoods.  This  will  lay  the  foundation 
for  the  OANLM  and  DANLM  algorithms. 

Let  Rltfl{-)  represent  the  rotation  operator  on  a  neighborhood;  when  ap¬ 
plied  to  a  generic  point  (u,  v)  e  S  in  the  neighborhood  around  {u,  ft),  R®  fl 
rotates  (u,  v)  by  0°  counter-clockwise  around  the  point  (u.  ft).  For  a  set  Q  C  S 
we  define  Q„  =  Reu^(Q)  as 

(s,t)  e  Qo  3(u,u)  G  Q  such  that  (s,t)  =  R°„^{{u,v)). 


The  (6,  S s,  8 i)- anisotropic  neighborhood  of  the  point  (ti,t2)  is  defined  as 


h,sa,st{t  uh)  —  R°tu 


t-2 


( 

<5 s  ^ s 

( 

2’fl+  2. 

X 

u~rh+2_ 

ns. 


Figure  3:  The  anisotropic  neighborhood  Ig,ss,se(ti,  £2)  in  the  discrete  setting 
for  two  different  pixels  of  a  Horizon  class  image. 


where  0.  Se,  and  Ss  (Ss  <  Se)  represent  the  orientation  angle,  length,  and  width 
of  the  neighborhood,  respectively.  Figure  3  displays  such  neighborhoods  for 
two  different  pixels.  Each  element  of  the  NLM  partitioning  of  Io.ds,s, (ii,  t2) 
into  ns  x  nf  pixelated  regions  has  a  corresponding  element  in  the  partitioning 
of  neighborhoods,  given  by  I3g%%(ti,t2)  =  R°lutMi  ~  x  [*i  ~ 

■M±  +,  -l  .Mill 

In.,  +  2na  J  > ' 

The  pixelated  neighborhood  of  (ti,  t2)  is  defined  as  the  y  G  E"3 xne  satis¬ 
fying 

ytfte'tiiih)  =  jj-  [  f  dY(si,s2). 


Define  the  neighborhood  process  as 

nsnt 


5sSe 


dY(si,s2). 


(«i,s2)e/  s  6,  (tite) 


0  -JL  _ L 

'  ns ’ n£ 


The  anisotropic  (Se,  Ss,  ^-neighborhood  distance  between  two  given  points  in 
the  image  is  then  defined  as 

d2e,sa,Se(dY^  1^2),  dY(s1,s2)) 

=  — (lly'A5,  - y"Afll2  -  |y£?4A(o, 0)  - (o,o)|2) .  (6) 

/  Lg !  L£  I  \  / 
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The  ANLM  estimate  at  point  is  given  by 


f0'5s’St(t  i,ta) 


/(8l,sa)65  (g1^2)A:(g1,  S2)dsid.S2 

f  f  Wt;%Se(s1,s2)ds1ds2 


where  the  weights  are  obtained  from 


w 


0,Ss ,Sf 
ti,t2 


(si,s2) 


1  if  4AA(dY(tut2),dY(Sl,s2))  + 

0  otherwise. 


(7) 


Here  rCT  is  the  threshold  parameter  and  2nflse°  +  rCT  is  the  threshold  value. 

ANLM  extends  NLM  in  two  significant  ways.  First,  in  ANLM  0.  Ss,  6( 
are  free  parameters,  while  in  NLM  0  =  0  and  Ss  —  S(.  As  Figures  4  and 
5  demonstrate,  these  free  parameters  significantly  affect  the  performance  of 
ANLM.  Figure  4  illustrates  the  visual  denoising  results  as  we  vary  the  angle 
of  the  anisotropic  neighborhood.  Figure  5  does  the  same  for  the  empirical 
peak  signal-to-noise  ratio  (PSNR).6 


3.2.  Oracle  ANLM 

It  is  clear  from  Figures  4  and  5  that  we  should  align  the  ANLM  neigh¬ 
borhood  locally  with  the  edge  to  maximize  denoising  performance.  In  the 
Oracle  ANLM  (OANLM)  algorithm,  we  assume  that  we  have  access  to  an 
oracle  that  knows  the  image’s  edge  orientation  at  each  pixel.  Consider 
fh(t  1O2)  £  Ha(C).  Let  h'(ti)  denote  the  derivative  of  the  edge  contour 
h(ti),  and  let  r7  =  { (f  1,  t2)  :  h  =  7}  segment  the  region  S.  Define  01  as  the 
angle  between  the  tangent  to  the  edge  contour  h  and  the  horizontal  line  at 
t,\  =  7.  OANLM  is  defined  as  ANLM  with  the  following  parameter  settings: 

•  Quadratic  scaling:  Instead  of  using  isotropic  (square)  neighborhoods, 
we  use  anisotropic  (rectangular)  neighborhoods  of  size  Ss  x  d(.  The 
scaling  of  5S  and  6(  depends  on  the  smoothness  of  the  edge.  Since 


6In  practice  images  are  measured  on  a  discrete  lattice,  i.e.,  the  pixelated  values  are 
known.  Let  the  pixelated  values  of  the  original  noise-free  image  and  the  estimator  be 
fij  and  fi  j,  respectively.  The  empirical  mean-squared-error  (MSE)  of  an  estimator  /  is 
defined  as  MSE  =  ~  fi.j)2-  h  fi-j  €  [0,  A\,  then  the  empirical  peak  signal-to- 

noise  ratio  (PSNR)  is  defined  as  101og10 
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(a)  0°,  PSNR  =  20.6dB 


(b)  45°,  PSNR  =  19.2dB 


(a)  90°,  PSNR  =  20.6dB  (d)  135°,  PSNR  =  28.9dB 


Figure  4:  Effect  of  the  anisotropic  neighborhood  orientation  angle  0  on  the 
denoising  performance  of  ANLM.  The  images  are  256  x  256  pixels  with  a 
true  edge  orientation  of  135°,  and  the  noise  standard  deviation  in  practical 
algorithms  is  £  =  0.5  (see  Section  4.1).  The  neighborhood  size  and  threshold 
parameter  are  Ss  —  1/256,  Sg  —  20/256  and  r  —  2.7£2,  respectively.  The 
angles  of  the  neighborhoods  (0)  and  the  resulting  PSNR  of  the  estimates 
are  noted  below  the  images.  There  is  a  substantial  subjective  and  objective 
improvement  in  the  performance  of  ANLM  when  the  neighborhood  is  aligned 
with  the  edge. 
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Figure  5:  Effect  of  the  anisotropic  neighborhood  orientation  angle  6  on  the 
denoising  performance  of  ANLM.  The  experimental  parameters  are  the  same 
as  in  Figure  4.  For  each  value  of  0  we  plot  the  result  of  10  Monte  Carlo 
simulations. 


we  are  primarily  interested  in  //"(C),  we  will  emphasize  the  quadratic 
scaling* 7  8S  =  4a4/3 1  log  a  | 4/3  and  5t  =  2a2/3 1  logrr|2/3;  that  is  8S  =  8j. 

•  Aligned  neighborhoods:  We  align  the  neighborhood  at  point  (ti,t2)  to 
the  orientation  0tl  of  the  edge  contour  at  (ti,h(ti)).  Thus  all  pixels 
in  a  column  have  the  same  orientation.  Figure  6  illustrates  such  a 
neighborhood  selection.8 

•  Logarithmic  pixelation:  We  assume  that  ns  x  rq  —  log2  a.  In  other 
words,  as  a  increases  the  resolution  of  the  pixelated  image  increases  as 
well. 


Theorem  2. 


If  f°  is  the  OANLM  estimator  with  the  threshold  tct  = 


'If  the  smoothness  of  an  edge  is  a  >  1,  then  the  optimal  neighborhood  size  is  given  by 

<5,  =  0(ct~2o/(“+1)|  logCT|2“/(“+1))  and  5f  =  |  log cr|2/(Q+1)). 

8In  smooth  regions,  where  the  pixel  neighborhoods  do  not  intersect  with  the  edge, 
neither  the  shape  nor  the  orientation  of  the  neighborhood  affect  the  denoising  performance. 
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Figure  6:  The  OANLM  neighborhoods  satisfy  the  quadratic  scaling  8S  — 
82  and  are  aligned  with  the  edge.  In  regions  far  from  the  edge,  isotropic 
neighborhoods  perform  just  as  well. 


then 


R(Ha(C),f°)  =  0(<r4^3|  log(r|4/3). 


The  proof  of  Theorem  2  can  be  simply  modified  to  provide  a  bound  for  the 
risk  of  NLM  as  well.  Let  8  =  y/8s8g,  where  8S  and  8g  are  as  given  in  OANLM 
algorithm. 

Corollary  1.  If  fN  is  the  NLM  estimator  with  the  threshold  Ta  =  2/^/|  log(cr)|, 
then 


R(Ha(C)JN)  =  0(a\\oga\). 

In  fact,  under  certain  mild  assumptions,  the  risk  of  NLM  can  be  lower 
bounded  by  !2(<r).  The  proof  is  similar  to  the  proof  of  Theorem  5  in  [8]. 
However,  for  the  sake  of  completeness  and  since  our  framework  differs  from 
[8],  we  overview  the  main  steps  of  the  proof  in  Appendix  B. 

Theorem  2  is  based  on  the  strong  oracle  assumption  that  the  edge  direc¬ 
tion  is  known  exactly.  Needless  to  say,  such  information  is  rarely  available  in 
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practical  applications.  Consider  a  weaker  notion  of  OANLM  that  lias  access 
to  an  estimate  0..  of  9~,  that  satisfies 

\0y  -9., |  <  0((j'3).  (8) 

OANLM  with  exact  edge  orientation  information  corresponds  to  3  =  oo 
in  this  model.  When  0  <  oo,  the  choices  Ss  =  <r2/3 1  log  cr | 2//3  and  Sg  = 
<r4/3 1  log  a | 4/3  are  not  necessarily  optimal.  The  following  theorem  character¬ 
izes  the  performance  of  OANLM  using  0. 


Theorem  3.  Let  the  OANLM  estimator  use  the  inaccurate  edge  ori¬ 
entation  0.,  satisfying  (8).  Set  the  neighborhood  sizes  to  Sg  = 
min(<72/3|  log2/'5  (t|,  <t_1+/3/2|  log<x|)  and  6S  =  cr'2  log2  a/Sg.  Then  the  risk  of 
the  estimator  with  ra  —  ‘If  \/\  log(<r)|  satisfies 

R(Ha(C),  f°)  =  0(5sPoly(|  logo))), 

where  Poly(|  log(j|)  is  a  polynomial  of  degree  at  m,ost  2  in  terms  of  \  logcr|. 

If  the  edge  estimate  is  exact,  then  the  result  simplifies  to  the  result  of 
Theorem  2.  However,  this  theorem  confirms  that  OANLM  algorithm  achieves 
the  optimal  rate  even  if  there  is  an  error  in  0  of  order  with  3  >  2/3. 


Corollary  2.  Let  the  OANLM  estimator  use  the  inaccurate  edge  orientation 
0-y  satisfying  (8).  Set  the  neighborhood  sizes  to  6g  =  M~  s  2|  log  a  and  Ss  — 
<r1+a/2|  log(r|.  Then,  the  risk  of  the  estimator  satisfies 

R{Ha(C),  f°)  =  0(<r1+«2 Polyfl  loga|)). 

This  corollary  suggests  that,  as  long  as  we  have  an  edge  orientation  es¬ 
timate  that  improves  as  the  number  of  pixels  increases,  OANLM  outper¬ 
forms  NLM.  Also  note  that,  as  0  decreases,  the  neighborhoods  become  more 
isotropic. 

3.3.  Discrete  angle  ANLM 

In  this  section,  we  introduce  the  Discrete-angle  ANLM  (DANLM)  algo¬ 
rithm  that  achieves  the  minimax  rate  without  oracle  information.  The  key 
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Figure  7:  DANLM  neighborhoods  around  one  pixel  location  for  q  =  4. 


idea  is  to  calculate  the  neighborhood  distance  over  several  directional  neigh¬ 
borhoods  and  fuse  them  to  obtain  a  similarity  measure  that  works  well  for  all 
directions.  As  for  OANLM,  set  Ss  —  cr4/3 1  log  (a))4/3  and  Se  =  (r2//3|  log(fx)  |2//3, 
and  let  q  —  ■ na~ 2/3.  Define  the  angles  Oo  =  0,  0\  =  <72/3,  02  =  2cr2/3, . ..  ,9q  = 
n  —  cr2/3.  For  a  point  {t\,t2)  we  consider  all  of  the  anisotropic,  directional 
neighborhoods  for  0  e  0.  Figure  7  displays  four  of  these  neighborhoods. 

Define  the  discrete  angle  anisotropic  distance  between  dY (t.i,  t2)  and 
dY(s1,s2) 

d2A(dY(tut2),  dY  (si,  s2))  =  mmd2gSs5e(dY(ti,t2),  dY  (s1?  s2)), 


where  dg S  Se(dY(ti,t2),  dY(si,  s2))  is  defined  in  (6).  The  DANLM  estimate 
is  then  given  by 


f(sus2)es  whKs^  S2)X(^U  S2)dSlds2 
f(suS2)esw£Nt2(siis2)dsids2 


where 

wf1Nt2{s1,s2) 


1  if  d2A{dY(t1,t2),dY{s1,s2))  <  ’-^g^  +ra, 
0  otherwise. 


(9) 


In  summary,  we  note  the  following  features  of  the  DANLM  algorithm. 
First,  it  uses  the  quadratic  scaling  Ss  —  S2.  Second,  the  optimal  neighborhood 
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direction  can  change  from  pixel  to  pixel.  The  following  theorem,  proved  in 
Section  5,  shows  that  the  risk  of  this  algorithm  is  within  the  logarithmic 
factor  of  the  minimax  risk. 


Theorem  4.  The  risk  of  DANLM  satisfies 

R(Ha(Ca),fD)  <  0(cr4/3|  log(cr)|4/3). 

Figure  8  reprises  the  simulation  experiment  of  Figure  4  using  the  four 
algorithms  described  thus  far:  NLM.  OANLM  with  perfect  knowledge  of  the 
edge  orientation,  OANLM  with  imperfect  knowledge  of  the  edge  orientation, 
and  DANLM  with  four  angles.  All  three  of  the  latter  algorithms  outperform 
the  isotropic  NLM. 


4.  Practical  ANLM  algorithms  and  experiments 

In  this  section  we  introduce  a  practical  gradient-based  ANLM  algorithm 
and  then  complement  the  above  theoretical  arguments  with  additional  sim¬ 
ulations  and  experiments  with  synthetic  and  real-world  imagery. 


4-1.  Extension  to  discrete  images 

In  practice  the  observations  are  noisy  pixelated  values  of  an  image  and 
the  objective  is  only  to  estimate  the  pixelated  values.  In  this  section  we 
explain  how  the  ideas  of  directional  neighborhood  and  ANLM  can  be  ex¬ 
tended  to  the  discrete  settings.  Suppose  we  are  interested  in  estimating  a. 
n  x  n  image  /(A  J)  with  noisy  observations  oitj  =  f(f,  +  where  (uj  ~ 
iY(0,£2)  and  f  is  the  standard  deviation  of  the  noise.  The  extension  of  the 
anisotropic  neighborhood  to  the  discrete  setting  is  straightforward.  Let  S  — 
x  and/  =  {1,2,..., n}  x  {l,2,...,n}. 

For  a  given  set  B  c  S  we  define  B  =  B  fl  S. 

The  discrete  (6,  Ss,  ^-distance  between  two  pixels  0{j  and  om  (  is  defined  as 


(^0,Ss,8e  °m,e)  —  -p  ^  ( °i+r,j+q  °m+r,e+q )  ? 


(r,q)eV 


(10) 
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(a)  NLM,  PSNR  =  26  dB 


(b)  OANLM  with  no  error, 
PSNR  =  28.9  dB 


(c)  ONALM  with  error,  (d)  DANLM,  PSNR  =  27.5  dB 

PSNR  =  28.6  dB 


Figure  8:  Continuation  of  the  experiment  of  Figure  4  that  compares 
(a)  isotropic  NLM,  (b)  OANLM  with  perfect  knowledge  of  the  edge  direc¬ 
tion,  (c)  OANLM  with  10%  error  in  the  knowledge  of  the  edge  direction, 
(d)  DANLM  with  q  =  4  angles.  The  images  are  of  size  256  x  256  pixels.  In 
the  ANLM  algorithms,  Ss  and  S(  are  the  same  as  in  Figure  4.  The  neighbor¬ 
hood  size  of  NLM  is  \A7  x  6(.  The  edge  orientation  is  135°.  The  standard 
deviation  of  the  noise  is  £  =  0.5. 

where  V  =  {( r,q )  G  Z2  |  (1Jri152)  G  Iefi.,st{i/n,j/n)}.  See  Figure  9.  The 
ANLM  estimate  at  pixel  /(A  is  given  by 
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*x 

Figure  9:  The  anisotropic  neighborhood  Io,5s,se{L,  ~)  hi  the  discrete  setting 
for  two  different  pixels  of  a  very  simple  Horizon  class  image. 


i-9,S3,S( 
■'  ijj 


where  the  weights  are  obtained  from  the  distances  d%s ,  ^(oi^v 
stance,  the  simple  policy  introduced  in  (7)  corresponds  to 


>m,e) 


For  in- 


(m,£)  =< 


if  <%,6s,6e  K 

otherwise. 


j’ 


Omjt)  <  2f2  +  Tn 


(11) 


Using  the  discrete  anisotropic  neighborhood  and  distance,  the  extensions  of 
OANLM  and  DANLM  to  the  discrete  setting  is  straightforward. 


4-2.  Gradient-based  ANLM 

While  DANLM  is  somewhat  practical  and  also  theoretically  optimal  for 
the  Horizon  class  of  images,  its  computational  complexity  is  higher  than 
NLM  and  grows  linearly  in  the  number  of  directions  q,  where  q  ~  o(n2/3).  As 
a  more  practical  alternative,  we  propose  Gradient  based  ANLM  (GANLM), 
which  adjusts  the  orientation  of  an  anisotropic  ANLM  neighborhood  using  an 
estimate  of  the  edge  orientation  provided  by  the  image  gradients.  Pseudocode 
is  given  in  Algorithm  4.2.  Note  that  GANLM  reverts  back  to  NLM  in  regions 
with  low  image  gradients,  since  they  will  not  be  “edgy”  enough  to  benefit 
from  the  special  treatment. 
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Algorithm  1  Gradient-based  ANLM  (GANLM) 

Inputs: 

fi  j  :  Estimate  of  the  image  pixel 
8S  x  Sg:  Size  of  the  neighborhood 

A:  Threshold  that  determines  isotropic/anisotropic  neighborhood  selection 


Estimate  image  gradient  (gh(i,j),  9v(i,j))  at  each  pixel  (i,j) 
for  every  pixel  (i,j)  G  I  do 

=  Vd2h(i,j)+gUhj) 


Oij  =  tan 


-l 


>9h(i,j). 


if  gi  >  A  then 

Perform  ANLM  at  pixel  tjij  with  ds^s.^.o, , 
else 

Perform  NLM  at  pixel  ytj  with  y/8sSg. 

end  if 
end  for 


There  is  a  rich  literature  on  robust  image  gradient  estimation  [23  25]. 
Most  simply,  if  f]h(i,j)  and  gv(i,j)  are  the  estimated  image  derivatives 
at  pixel  (i.  j),  then  we  can  estimate  the  local  orientation  of  an  edge  by 

0(i,j)  —  tan-1  •  To  allay  any  concerns  that  gradient-based  adaptivity 

is  not  robust  to  noise  and  errors,  we  recall  Theorem  3,  which  establishes  the 
robustness  of  OANLM  to  edge  angle  estimation  error.  For  extremely  noisy 
images,  numerous  heuristics  are  possible,  including  estimating  the  image  gra¬ 
dients  for  GANLM  from  an  isotropic  NLM  pilot  estimate.  Figure  10  builds 
on  Figure  8  with  a  slightly  more  realistic,  curved  edge  and  demonstrates  that 
the  pilot  estimate  approach  to  GANLM  performs  almost  as  well  as  an  oracle 
GANLM  that  has  access  to  the  edge  gradient. 

Table  2  summarizes  the  performance  of  the  algorithms  introduced  in  this 
paper  with  that  of  NLM  on  the  natural  test  images  Barbara  [26],  Boats  [27], 
and  Wet  Paint  [28]  submerged  in  noise  of  two  different  levels.  The  table 
demonstrates  two  facts.  First,  the  performance  of  the  practical  GANLM 
algorithm  is  very  close  to  the  oracle  GANLM  algorithm.  Second  these  two 
algorithms  outperform  DANLM  in  all  but  one  case  and  significantly  out- 
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(a)  Horizon  image  (b)  Noisy  image,  (c)  NLM,  PSNR  =  34.4dB 

PSNR  =  16.5dB 


(d)  DANLM,  (e)  Oracle  GANLM,  (f)  GANLM,  PSNR  =  38dB 

PSNR  =  36.2dB  PSNR  =  38dB 


Figure  10:  Simulation  experiment  in  the  same  vein  as  Figures  4  and  8  but 
with  a  slightly  more  realistic  C2  curved  edge.  The  images  are  of  size  512  x 
512  pixels.  In  the  ANLM  algorithms  8S  and  Si  follow  the  quadratic  scaling 
of  Theorem  2.  The  neighborhood  size  of  NLM  is  \/Ss  x  Sf.  The  standard 
deviation  of  the  noise  is  £  —  0.15. 
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Table  2:  Performance  of  NLM  and  ANLM  algorithms  on  three  test  images 
at  two  noise  levels. 


Test  image 

Algorithm 

$  =  0.25 

£  =  0.15 

NLM 

22.48 

25.86 

Barbara 

DANLM 

23.15 

26.27 

(512  x  512) 

Oracle  GANLM 

23.51 

26.63 

GANLM 

23.50 

26.60 

NLM 

22.75 

25.83 

Boats 

DANLM 

23.45 

26.45 

(512  x  512) 

Oracle  GANLM 

23.88 

26,45 

GANLM 

23.75 

26.35 

NLM 

27.66 

30.49 

Wet  Paint 

DANLM 

28.41 

30.75 

(1024  x  1024) 

Oracle  GANLM 

29.02 

31.18 

GANLM 

28.86 

31.07 

perform  standard  NLM  in  all  cases.  We  use  4  discrete  angles9  in  DANLM 
and  attribute  the  superior  performance  of  GANLM  over  DANLM  to  more 
accurate  selection  of  orientations. 


5.  Proofs  of  the  main  results 

5.1.  Preamble 

We  first  introduce  some  notation.  Define  the  following  partitions  of  the 
set  S: 

51  =  {{v, u)  |  u  >  h{v)  +  (1  +  C/2)5S}, 

52  —  I  h(v)  -  (1  +  Cj 2)Ss  <  u  <  h(v)  +  (1  +  C/2)5S}, 

53  =  {(w,  u)  I  u  <  h(v)  -  (1  +  C/2)5S}. 

It  is  important  to  note  that,  if  (£i,£2)  £  Si  and  tan(0)  =  h'(ti),  then 

Ie,ss,6e(ti,  £2)  does  not  overlap  with  the  edge  contour.  In  other  words,  the 
correctly  aligned  neighborhood  of  (£1 ,  £2)  £  Si  is  always  above  the  edge.  The 
points  in  S3  satisfy  a  similar  property.  This  is  clarified  in  Figure  11. 


^Experiments  with  larger  q  did  not  differ  appreciably  in  PSNR  at  this  image  size. 
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Figure  11:  Regions  Si,  S2  and  S3 .  The  neighborhood  of  (u,  v)  G  S3  is  aligned 
with  the  edge  contour,  and  therefore  it  does  not  intersect  the  edge,  while  the 
neighborhood  of  (u',  v')  is  not  aligned  and  therefore  may  intersect  with  the 
edge.  The  neighborhoods  of  the  pixels  in  S2  may  intersect  the  edge  even 
though  they  are  correctly  aligned. 


We  further  partition  Si  into  Pi  and  P>  and  S3  into  P3  and  P4  such  that 

Pi  ±  {(«,  u)  |  h{y)  +  2 5t  +  C/2Ss  <  u}, 

P2  4  {(v, u)  |  h(v)  +  (1  +  C/2)5S  <u<  h(v)  +  25 e  +  C/25s}, 

P3  =  {(v,  u)  |  h(v)  -  (1  +  C/2)5S  >u>  h(v)  -  25 e  -  C/25s}, 

Pa  —  {(yiM)  I  u  <  h(v)  —  25 £  —  C/25s}. 

Lemma  1.  Any  neighborhood  of  pixel  (v,  u)  G  Pi  will  lie  completely  above 
the  edge  contour. 

Proof.  Let  (ti,t2)  G  Px.  If  (u,v)  G  then 

t2  -  v  <  5e.  (12) 

On  the  other  hand,  for  t\  G  [t x  —  5(,  tx  +  5f]  we  have, 

h(t i)  =  h(t  1)  +  h'(h)(ti  -  h)  +  -  h)2, 
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t. 
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a. 


a, 


Figure  12:  The  regions  Pi,  P2,  P3,  Pi  with  as  =  (1  +  C/2)8S  and  Qg  — 
28 1  —  8S.  Every  neighborhood  of  {t\,t2)  £  P\  will  lie  completely  above  the 
edge  contour.  However,  some  of  the  neighborhoods  of  the  pixels  (H,  t2)  £  P> 
may  intersect  the  edge.  A  similar  property  holds  for  the  regions  P:5  and  P4. 


where  t'{  is  between  t \  and  t[.  Therefore, 


(13) 


Comparing  (12)  and  (13)  completes  the  proof. 


□ 


I11  spite  of  Pi,  some  of  the  neighborhoods  of  the  pixels  (v,u)  e  P>  may 
intersect  the  edge.  A  similar  property  holds  for  P5  and  P4,  respectively. 
Figure  12  displays  these  regions.  The  following  lemma,  proved  in  [8],  will  be 
used  in  the  proofs  of  the  main  theorems.  For  the  sake  of  completeness,  we 
sketch  the  proof  here. 

Lemma  2.  Let  Z\,Z2, ,  Zr  be  iid  N(0, 1)  random  variables.  The  xf-  ran¬ 
dom  variable  concentrates  around  its  mean  with  high  probability, 


i.e. 


P 


P 


(14) 


(15) 
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Proof.  Here  we  prove  (14);  the  proof  of  (15)  follows  along  very  similar  lines. 
From  Markov’s  Inequality,  we  have 


=  e"*-*  E  e 


Q-vt-v 

(i-?) 


The  last  inequality  follows  from  Lemma  3  in  [8].  The  upper  bound  proved 
above  holds  for  any  r/  <  To  obtain  the  lowest  upper  bound  we  mini- 
over  r;.  The  optimal  value  is  rf  =  arg  min,. 

Plugging  rf  into  (16)  proves  the  lemma.  □ 


rt 
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5.2.  Proof  of  Theorem  2 

In  the  set  up  above,  we  consider  three  different  regions  for  the  point 
(t\,t2).  As  we  will  see  in  the  proof,  the  risk  of  all  pixels  in  each  region  has 
the  same  upper  bound.  We  calculate  these  upper  bounds  and  then  combine 
them  to  obtain  a  master  upper  bound  for  the  risk  of  OANLM. 


Case  I:  (ti,  t2)  G  S\.  We  know  that  if  the  anisotropic  neighborhood  of  (i1}  t2), 
h.8s.se  is  aligned  with  the  edge  contour,  i.e. ,  tan(d)  =  h'(ti),  then  it  does 
not  intersect  the  edge  contour.  To  calculate  the  OANLM  estimate  we  first 
calculate  the  weights.  Define 


Ztf,t2lUuh)  =  /  dW{sus2),  (17) 

S*S‘  J(sus2)eWs% 


and 


Z(h,t2)  -  /  /  dW(si,s2), 

J  J{.si,sz)el  Ss_  Se(ti,t2) 

'  ns  ’  ng 

F{tut2)  =  yy1  [  [  f(si,s2)dsids2. 

°sOe  J  .)(SltS2)ei^  Sj.(tut2) 
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For  notational  simplicity  we  will  use  w(si,s2)  and  ztlit2(ji,  j2)  instead  of 
w°tulfe(s  i,s2)  and  z 0tfj?e{juj2),  respectively.  Define 

=  {(u,w)  G  Pi  |  w(u,v)  =  1}, 

A2  =  {(it, v)  e  Pi  |  w(u,v)  =  0}. 

Here,  let  A(-)  denote  the  Lebesgue  measure  of  a  set  over  R2. 

Lemma  3.  Let  5g  =  2<72/3|  log(<x)|2/3  and  Ss  —  4<r4/3|  log(<r)|4/3,  ns  = 

2|  log(fr) |2/3  n(  =  |  log(<7)|4/3.  and  ra  —  ,  2  Then, 

V 1  log  I 

p(A(p1)-a(a1)>€)  =  O£0, 

P(A(P4)  -  A(A2)  >  e)  =  o(^pj. 


Proof.  Here  we  prove  the  first  inequality.  The  proof  of  the  second  inequal¬ 
ity  follows  a  similar  route.  Since  ztlit2(ji,  j2)  is  the  integral  of  the  Wiener 

2 

sheet,  we  have  ztl,t2(ji,j2)  ~  N(0,  n‘^  )•  Combining  this  with  Lemma  2  we 
conclude  that 


Therefore, 


E(A(.4))  =  E  [  I({u,v)eA)=  f  P((it,  v)  g  A) 

J(u,v)ePi  J  (u,v)ePi 

=  A(Pi)  -  0(<78). 


(18) 


An  upper  bound  for  P(A(Pi)  —  A (.4)  >  e)  using  the  Markov  inequality  yields 
the  result 


P(A(Pi)  -  \{A)  >  e)  < 


E(A(P 1)  -  A(A)) 
e 


□ 
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Define  the  event  £  as 


£  =  {A (A)  -  A(A)  <  a2}  n  {A(P4)  -  A(A)  <  a2}. 

For  notational  simplicity  we  also  define  the  following  notations: 


wX  dudv 

w(u,  v)X (u,  v)dudv, 

X  dudv 

A 

X(u,  v)dudv, 

wdudv 

A 

w(u ,  v)dudv. 

Zdudv 

A 

Z(u,  v)dudv , 

w  Z  dudv 

A 

w(u,  v)Z(u ,  v)dudv. 

The  risk  of  the  OANLM  estimator  at  (t\ .  t2)  G  Si  is  then  given  by 

fs  wXdudv 
fs  wdudv 

Define  U  =  We  have 

E{U)  =  E (U  |  £)P(£)  +  E(U  |  £C)F(£C)  <  E(U  \  8)P{£)  +  F{£c).  (19) 

Lemma  3  proves  that  P(£c)  -  0(a6).  We  have 


wXdudv  —  /  Xdudv 

Pi  J  Pi 


wXdudv  —  /  wXdudv 
Pi  JAi 


wXdudv  —  /  wXdudv 
Pi  ->Ai 

0{\{P\\Ai)). 


Xdudv  —  /  Xdudv 
Ai  JPi 

/  Xdudv  —  /  Xdudv 

' A i  JPi 


(20) 


For  the  last  inequality  we  have  assume  that  the  estimate  is  bounded  over 
the  P\  region.  This  assumption  can  be  also  justified  by  calculating  the  prob¬ 
ability  that  this  condition  does  not  hold  and  showing  that  the  probability 
is  negligible.  Using  arguments  similar  to  (20)  it  is  straightforward  to  prove 
that 
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wXdudv  —  /  Xdudv 
p4  ■!  P\ 

wdudv  —  /  dudv 
Pi  J  Pi 


wdudv  —  /  dudv 
p4  J  Pi 


=  0(A(P4\A2)). 
=  0(X(P1\A1)). 
=  0(A(P4\A2)). 


(21) 


Define  P14  =  Px  U  P4  and  B  =  S\Pu.  We  now  calculate  an  upper  bound 
for  the  first  term  of  (19) 


E (U  |  £)P(£) 
=  E 


fPi  i  wXdudv  +  f B  wXdudv\ 
jPi  i  wdudv  +  fB  wdudv  J 


£  P(<?) 


=  E 


fp  Xdudv  +  [B  wXdudv  V 


fPi  dudv  +  fB  wdudv  J 
Xdudv  +  f„wXdudv\ 

—r - -  +  0(a 

JPi  dudv  +  JB  wdudv  J 


£  P(<?)  +  0(a2) 


<  E 


fB  wF dudv  \ 2  /  fPi  Zdudv  +  jB  wZdudv\  2 

fp  dudv  +  fB  wdudv  J  l  JPi  dudv  +  fB  wdudv  J 


(fp  Fdudv-\ -  JBwFdudv\  2  f  JPi  Zdudv  +  fB  wZdudv\  2 

fPi  dudv  +  fB  wdudv  J  \  l  fp  dudv  +  fB  wdudv  J 


0(a 2). 


(22) 


Lemma  4.  Let  w{u.v)  be  the  weights  of  OANLM  with  S(,Ss,ns,ri(,  and  ra 
as  specified  in  Lemma  3.  Then 


E 


2 

(fn  wF  dudv  \  ,  /■>, 

7 — ,  ,  ,  r - j—r  -  °(a‘  I  losM' '  )• 

JPi  dudv  +  JB  wdudv  j 
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Proof.  We  have 


E 

=  (m)2  =  0(ff4/3|IogM|4/3)- 

To  obtain  the  inequality  we  maximize  the  numerator  and  minimize  the  de- 
nomerator  independently.  □ 

Lemma  5.  Let  w(u,v)  be  the  weights  of  OANLM  with  6e,5s,ns,ne,  and  t„ 
as  specified  in  Lemma  3.  Then 


fB  wFdudv 
fp  d,ud.v  +  fD  wdudv 


L  Zdudv  +  fRwZdudv\2  ...  ... 

Jp‘  .  , - rB  ,  ,  <  0(ct  /3|log(t>-)|4/3). 

fPi  dudv  +  fB  wdudv  I 


Proof.  Since  fB  wdudv  >  0,  and  we  are  interested  in  the  upper  bound  of  the 
risk,  we  can  remove  it  from  the  denominator  to  obtain 


fPi  Zdudv  +  JBw Zdudv 
fPi  dudv  +  fB  wdudv 

(  fp  Zdudv\ 

<  E  ^ -  +  E 

V  Jpj  dudv  I 


fp  Zdudv  +  JB  w Zdudv 


fp  dudv 


fB  w  Zdudv 
fPi  dudv 


2 


Now  we  obtain  upper  bounds  for  V\ .  V2 .  and  V3  separately.  First,  we  have 


V1  =  E 


fPi  Zdudi A 
fPl  dudv  J 


=  E 


fPi  fPi  Z(u,v)Z(u',v,)dudvdu'dv' 


Pi 


dudv 


(  fpi  ft  a4<  (u,v)  E(Z (u’  v)Z(u',  v'))dudvdu'dvr 

ns  '  ti£ 


Pi 


dudv 


< 


\ 

/  r  r  'Iff!  dudv  da' dv1 

Jpi  Jt  2Ss  24£(,,'B)  °s°e 
•  ns  '  I|£ 


V 


Pi 


dudv 


/ 


=  0 


Me 

nsrie 


To  obtain  an  upper  bound  for  V2,  we  first  note  that 
E(iu(ii,  v)Z(u,  v)w(u',  v')Z(u' ,  i>')) 

<  y/E(u;(u,  v)Z(u,  v))2y/E(w(u',  v')Z(u'  ,v'))2  = 


nsriea 

SsS( 


and  hence, 

V2  —  E 

=  E 

< 


fB  wZdudv\ 
fPl  dudv  J 

Ib  in  w(u,  v)Z(u,  v)w(u',  v')Z(u',  v')dudvdu'dv' 


(fp  dudv)2 


JBJB^fdudvdu'dv' 
(fPi  dudv)2 


=  0(6}). 


(23) 


(24) 


(25) 


Using  (23)  and  (25)  it  is  straightforward  to  obtain  an  upper  bound  for  V3. 
Combining  the  upper  bounds  for  V[ .  V2,  and  V3  completes  the  proof.  □ 

Combining  (22)  with  Lemmas  4  and  5  establishes  the  following  upper 
bound  for  Case  I,  where  (U ,  t2)  G  S\: 


E  (f(tut2)- 


fs  w(u,  v)X(u,  v)dudv ^ 


fs  w(u,  v)dudv 


J  =0(<x4/;?|log(<7)|4/3).  (26) 
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Case  II:  (p,  t2)  G  S2.  In  this  case  we  assume  that  the  risk  is  bounded  by  1, 
since  the  function  /  is  bounded  between  0  and  1.  If  the  estimate  is  out  of 
this  range,  then  we  will  map  it  to  either  0  or  1. 

Case  III:  (ti,t2)  G  .S3 .  This  case  is  exactly  the  same  as  Case  I  ,  and  hence 
we  skip  the  proof. 

Finally,  combining  our  results  for  Cases  I,  II,  and  III,  we  can  calculate  an 
upper  bound  for  the  risk  of  OANLM  as 

Rif, h  =  e(h/  -  r  112)  =  [  (/  -  h2dhdt2  +  [  (/  -  rfdhdt2 

j  S1US3  j  S2 

=  A(S1US3)0(<T4/3|log((r)|4/3)  +  A(52)0(l)  =  0(<T4/3|log(<T)|4/3). 

So  ends  the  proof  of  Theorem  2. 

5.3.  Proof  of  Corollary  1 

The  proof  of  this  corollary  follows  exactly  the  same  route  as  that  of  The¬ 
orem  2.  We  merely  redefine  the  regions  Sk  and  P*.  for  k  G  {1, 2, 3}.  The  new 
definition  of  the  S/~  regions  for  the  NLM  algorithm  is  given  by 

S'i  =  {{ti,t2)  I  t'2  >  h(t,i)  +  25}, 

Sf  =  {{tut2)  |  t2  <  hfC)  -  25 }, 

and  S2  —  S\Si  U  Sf.  Since  the  neighborhoods  in  NLM  are  isotropic,  we 
require  a  single  parameter  to  describe  the  neighborhood  size,  5  —  5S  —  5(. 
The  main  assumption  is  that  (-5 2  <  a.  This  is  clear  since  5  is  assumed 
to  go  to  zero  as  a  — »  0.  Therefore  in  the  isotropic  case,  the  neighborhood 
of  the  pixels  in  S'i  and  S3  does  not  intersect  with  the  edge  contour.  Since 
the  neighborhoods  no  longer  have  different  anisotropic  lengths,  the  size  of 
the  intersection  of  the  neighborhoods  and  the  edge  contour  is  identical  in  all 
directions  and  there  is  no  need  to  define  the  regions,  Pi, ....  P4.  Equivalently, 
Pi  =  Si,  P4  —  S4  and  P2  =  P3  =  0.  With  these  definitions  the  proof  of  this 
corollary  is  exactly  the  same  as  above. 

5-4-  Proof  of  Theorem  3 

The  proof  of  this  theorem  is  very  similar  to  the  proof  of  Theorem  2.  The 
only  difference  is  in  the  definitions  of  Sk  and  P/,.  for  k  G  {1,2,3}.  Since 
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there  is  a  mismatch  between  the  orientations  of  the  neighborhood  and  the 
edge  contour,  the  neighborhood  of  S\  may  intersect  the  edge.  In  order  to  fix 
this,  we  define  the  new  regions  called  Sf  and  Pf .  If  the  error  in  0  is  upper 
bounded  by  Cpo /3,  then  define 

S\  —  {(^1  > ^2)  I  ^2  >  h{t\)  +  Cpo3 8 £  +  8S  +  C/28}}, 

S'(  =  {(*i, *2)  I  h  <  h(ti)  -  cpaij8e  -83-  C/28}}, 

and  S'/  =  S\S'i  U  Sf .  Furthermore,  define  Pf  =  Pi,  Pf  =  P4,  P/  =  Sf\P4, 
and  P-/  =  S’/\P\ .  Using  these  new  partitions,  the  proof  follows  exactly  the 
same  route  as  the  proof  of  Theorem  2. 


5. 5.  Proof  of  Theorem  4 


Since  the  proof  is  mostly  similar  to  that  of  Theorem  2,  we  shall  focus  on 
the  major  differences.  The  first  difference  is  that  we  consider  the  regions  Pi 
Pi  instead  of  Si  S3.  Previously  the  regions  Pi  and  P2  were  treated  jointly 
under  the  region  Si-  Instead,  we  shall  now  consider  Pi  and  P2  separately, 
and  their  differences  shall  become  progressively  apparent. 

Case  I:  (UT2)  £  Pi-  We  start  with  the  calculation  of  the  weights  w(u,  v)  for 
(u,  v)  e  Pi  U  Pi-  Define 

R 1  =  {(m,v)  <E  Pi  I  wD(u,v)  =  1}, 

P2  —  {(u,v)  ^  P4  I  WD(u,V )  =  0}. 

Lemma  6.  Let  8s,8f,ns,rii,t  be  as  defined  in  Lemma  3  and  let  q  =  ira~2/i 
in  the  DANLM  algorithm.  Then, 


P(A(Pi)-A(Pi)>c)  < 
P(A(P4)  -  A(P2)  >  e)  < 


7R722/3 

7T(722/3 

6 


The  proof  is  a  simple  application  of  Lemma  3  and  the  union  bound.  Note 
that  since  (/-i,i2)  £  Pi,  all  of  its  directional  neighborhoods  are  located  above 
the  edge  and  do  not  intersect  with  the  edge  contour. 

Define  the  event  T  as 

7  =  {A(Pi)  -  A(Pi)  <  a2}  n  {A(P4)  -  A(A0  <  a2}. 
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Lemma  7  confirms  that  P(.FC)  =  0(< 716/3).  The  rest  of  the  proof  of  Case  I  is 
exactly  the  same  as  the  proof  of  Case  I  in  Theorem  2,  and  therefore  we  do 
not  repeat  it  here. 

Case  II:  (0.  t2)  £  Pi-  I11  this  case  we  again  start  with  defining  the  following 
two  sets: 


Li  =  {(u,v)  G  Pi  |  w^v  =  1}, 

U  =  {M)€P4|<  =  0}. 

Lemma  7.  Lei  Ss,Sg,  ns,  ne,  Ta  be  as  defined  in  Lemma  3,  and  let  q  —  7 tct-2/3 
Then, 


P(A(P1)-A(L1)>e)  < 


7T(722/3 

e 


Proof.  If  we  prove  that  for  some  0  G  0,  Ie,sa,st{t\,  t2)  does  not  intersect  the 
edge,  then  the  proof  of  the  lemma  is  immediate.  Suppose  that  0*  is  such  that 
tan(0*)  —  h'(ti),  and  consider  0  —  argmin^e©  \0  —  0*\.  To  ensure  that  the 
neighborhood  Ig,ss,se(t ifi-2)  does  not  intersect  with  the  edge,  we  need  to  have 

t2  +  tan(0)  (t!  -  ti)  —  Ss  >  h(ti)  +  h'(ti)  (t'  -  ti) 

for  all  t!  G  [ti,ti  +  Sg).  Also,  t"  G  [ti,t'].  Clearly,  tan(0)  =  tan(0*)  + 
i+tan2(g')  A^,  where  A0  =  0  —  0*.  Furthermore,  \h"{t")\  <  C2  and  A0  <  | Sg. 
Therefore,  (27)  will  be  satisfied  if 

t2  >  h  (ii)  H — —  +  C2Sg. 

Tliis  constraint  holds  for  all  pixels  in  the  region  P2.  Therefore,  at  least  one  of 
the  neighborhoods  is  completely  in  the  region  above  h ,  which  in  turns  proves 
the  lemma.  □ 


Lemma  8.  Let  Ss,Sg,  ns,  n g,  ra  be  as  defined  in  Lemma  3,  and  let  q  =  na  2/3 
Then, 


P(A(P4)  -  A(L2)  >  e)  < 


Tra10'3 

e 
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Figure  13:  Explanation  of  the  neighborhood  rotation  in  Lemma  8. 


Proof.  We  first  prove  that  at  least  half  of  the  area  of  Iojs,se(u.v)  is  located 
below  the  edge.  For  notational  simplicity  we  assume  that  h'(t\)  =  0.  Con¬ 
sider  Figure  13,  in  which  the  neighborhood  has  an  arbitrary  orientation.  Let 
w,  v  be  as  defined  in  this  figure.  We  then  have  that 


d 


Cw2 


v  — 


cos  0  cos  6 


—  <L  tan  0  = 


d  —  Cw 2  —  (L  sin  0  d  —  Cw2  —  S , 


cos# 


> 


cos# 


(28) 


where  d  —  £2  —  h{t\).  Suppose  d  is  such  that,  at  the  correct  angle  d  —  Cw'2  — 
Ss  >  0,  i.e.,  d  >  (C  +  l)rr4/,;!.  According  to  (28),  v  >  0,  arid  hence  more  than 
half  of  the  area  of  the  neighborhood  is  in  the  white  region.  Therefore,  the 
number  of  pixels  in  this  region  is  4  log2  a  ±  o(log2  a).  □ 


Case  III:  (£1,^2)  £  SC  The  proof  of  this  case  is  identical  to  that  of  Case  II 
in  the  proof  of  Theorem  2. 

Case  IV:  (£1,  t2)  £  P?,  U  P\.  The  proof  of  this  case  is  similar  to  that  for  the 
regions  P\  and  in  Cases  I  and  II  and  therefore  is  skipped  here. 

Finally,  combining  the  upper  bounds  obtained  in  Cases  I,  II.  Ill,  and  IV 
completes  the  proof  of  the  theorem. 


6.  Related  work  on  anisotropic  denoising 

Anisotropy  is  a  fundamental  feature  of  real-world  images.  As  a  result, 
anisotropic  image  processing  tools  can  be  traced  back  at  least  as  far  as 
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the  1970s  [29].  Here,  we  briefly  compare  and  contrast  some  of  the  relevant 
anisotrpic  denoising  schemes  with  ANLM. 

Anisotropic  filtering  methods  use  a  space- varying  linear  convolution  filters 
to  reduce  blur  along  image  edges.  Toward  this  goal,  [29]  considers  several 
different  neighborhoods  around  a  pixel  and  selects  the  most  “homogeneous” 
neighborhood  to  smooth  over.  A  more  advanced  version  of  this  idea  can 
be  found  in  [30].  There  are  major  differences  between  such  algorithms  and 
NLM/ANLM.  in  particular,  the  estimators  are  local  and  do  not  exploit  global 
similarities. 

Anisotropic  diffusion  methods  smooth  a  noisy  image  with  a  Gaussian 
kernel.  As  the  standard  deviation  of  the  kernel  increases,  the  smoothing 
process  introduces  larger  bias  to  the  edges.  In  [31,  32]  the  authors  proved 
that  the  set  of  images  derived  by  this  approach  can  be  viewed  as  the  solution 
to  the  heat  diffusion  equation.  Perona  and  Malik  [33]  noted  the  isotropy 
in  the  heat  equation  and  introduced  anisotropy.  Their  anisotropic  diffusion 
starts  with  the  heat  equation  but  at  every  iteration  exploits  the  gradient 
information  of  the  previous  estimate  to  increase  the  conductance  along  the 
edges  and  decrease  it  across  the  edges.  Efforts  to  theoretically  analyze  the  risk 
of  this  algorithm  have  left  many  open  questions  remaining  [14].  It  is  worth 
noting  that  the  idea  of  applying  an  image  denoising  algorithm  iteratively 
and  guiding  it  at  every  iteration  based  on  previous  estimates  goes  back  to 
Tukey’s  twicing  [34]. 

Anisotropic  transformations  enable  simple  threshold  based  denoising  al¬ 
gorithms.  While  the  standard  separable  wavelet  transform  cannot  exploit 
the  smoothness  of  the  edge  contour,  a  menagerie  of  anisotropic  transforms 
have  been  developed,  including  ridgelets  [15],  curvelets  [16],  wedgelets  [9], 
platelets  [17,  35],  shearlets  [18],  contourlets  [19],  bandelets  [20],  and  direc- 
tionlets  [21].  As  mentioned  in  the  Introduction,  among  the  above  algorithms 
only  wedgelets  can  obtain  the  optimal  minimax  risk  for  the  Horizon  class; 
however  wedgelets  are  not  suited  to  denoising  textures.  One  promising  av¬ 
enue  combing  wavelets  (for  texture  denoising)  and  wedgelets  (for  edge  de¬ 
noising)  could  follow  the  path  of  the  image  coder  in  [36] . 

Alternative  anisotropic  NLM  algorithms  have  been  proposed  to  address 
the  inefficiency  of  using  a  fixed  neighborhood.  In  [37,  38]  the  authors  adapt 
the  neighborhood  size  to  the  local  image  content  but  do  not  provide  any 
optimality  analysis.  In  [38]  the  authors  consider  different  isotropic  NLM 
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neighborhood  sizes  depending  on  how  smoothly  the  image  content  varies. 
In  [37]  the  authors  do  not  change  the  neighborhood  size  (which  is  critical  to 
acheive  optimality)  but  rather  use  image  gradient  information  to  increase  the 
weights  of  the  NLM  along  edges  and  decrease  them  across  edges.  This  method 
is  equivalent  to  modifying  the  threshold  parameter  tn  to  force  NLM  to  assign 
higher  weights  to  edge-like  neighborhoods.  Unfortunately,  this  technique 
does  not  reduce  the  bias  that  renders  NLM  sub-optimal. 

Finally,  data-driven  optimality  criteria  have  been  considered  in  [39  41], 
where  the  authors  derive  lower  bounds  for  the  performance  of  denoising  algo¬ 
rithms.  However,  the  analyses  provided  in  these  papers  are  not  fully  rigorous 
and  do  not  cover  the  performance  of  NLM  for  images  with  sharp  edges. 

7.  Discussion  and  future  directions 

We  have  introduced  and  analyzed  a  framework  for  anisotropic  nonlocal 
means  (ANLM)  denoising.  Similar  to  NLM,  ANLM  exploits  nonlocal  infor¬ 
mation  in  estimating  the  pixel  values.  However,  unlike  NLM,  ANLM  uses 
anisotropic,  oriented  neighborhoods  that  can  exploit  the  smoothness  of  edge 
contours.  This  enables  ANLM  to  outperform  NLM  both  theoretically  and 
empirically.  In  fact,  the  performance  of  ANLM  is  wit  lit  in  a  logarithmic  factor 
of  optimal  as  measured  by  the  minimax  rate  on  the  Horizon  class  of  images. 

Numerous  questions  remain  open  for  future  research.  From  the  theoretical 
perspective,  the  risk  analysis  of  GANLM,  the  application  to  noise  models 
beyond  Gaussian,  and  the  extension  to  three  dimensions  and  beyond  (for 
seismic,  medical,  and  other  data)  pose  interesting  research  challenges.  From 
the  practical  perspective,  the  question  of  how  to  best  tune  ANLM  to  match 
the  nuanced  edges  and  textures  of  real- world  images  remains  open,  since  we 
have  considered  only  brutal  binary  images  here.  Finally,  while  NLM  is  no 
longer  the  state-of-the-art  denoising  algorithm,  it:  is  a  key  building  block  in 
several  top-performing  algorithms.  It:  would  be  interesting  to  see  whether 
anisotropy  pays  off  as  handily  for  those  algorithms  as  it  does  for  NLM. 
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Appendix  A.  Minimax  risk  of  the  mean  filter 


In  this  appendix,  we  obtain  the  decay  rate  of  the  risk  of  the  mean  Li¬ 
ter.  Our  proof  is  similar  to  the  proof  in  [14].  Nevertheless,  we  repeat  the 
proof  here  for  the  sake  of  completeness  and  since  our  continuous  framework 
is  slightly  different  from  the  discrete  framework  in  [14].  See  [8]  for  the  gen¬ 
eralization  of  this  result  . 

The  classical  mean  filter  estimates  the  image  via 


T\M 


T>), 


where  A  specifies  the  the  size  of  the  window  on  which  averaging  takes  place. 

Theorem  5  ([14]).  ///MF  is  the  estimate  of  the  mean  filter,  then 

inf  sup  R(f,fMF)  =  e(i 72'3). 

A  fena(c) 


Proof  We  first  derive  a  lower  bound  for  R(f,  /MF).  Consider  a  function 
fh(t\,t 2)  with  h(ti)  —  1/2  (recall  Figure  2)  and  define 

Qa  —  {(U; £2)  |  1/2  —  A  <  ^2  <  1/2  +  A}. 

It  is  straightforward  to  confirm  that  if  (£l5£2)  £  Qa/\,  then  |./),(U-  £2)  — 
E/MF(fi,/2)|  >  Therefore,  if  Bias(/MF)  denotes  the  bias  of  the  mean  Liter 
estimator,  then  we  have 

Bias2(/MF)  =  f  \fh{ti,t2)  -  EfMF(ti,t2)\2dtidt2 
Js 

>  [  \fh(U,b)  -E/MF(fi,f2)|2<i«1<Zf2  >  /;A.  (A.l) 

■lQA/4  62 
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Now  consider  a  point  (id.  /,2)  G  S\Q a-  We  have 
E(/MF(t1,ti)  —  E/MF(t1,i2))2 


E(fIIT(ib 


A 

2 


Ti,t2  -  r2)dW(ti 
A4 


-r'2)) 


A2 

Therefore,  if  Var(/MF)  is  the  variance  of  the  mean  filter  estimator,  then  we 
have 


Var(/MF)  > 


a~ 


S\Q4A 


E(/mf(£i,  -  E/MF(ii,  t2))2  =  — (1  -  4A).  (A.2) 


By  combining  (A.l)  and  (A.2)  we  obtain  the  lower  bound  of  ^  —  4^- 

for  the  risk  of  the  mean  filter  estimator.  If  we  minimize  this  lower  bound 
over  A  then  we  obtain  the  lower  bound  of  0(<x2/3)  for  the  risk. 

Now,  we  derive  an  upper  bound  for  the  risk.  Define  the  region 


Ra  —  {(£i?  £2)  |  h(h)  ~  A  <  t2  <  h(t. i)  +  A}. 


It  is  straightforward  to  confirm  that,  if  (ii,£2)  G  S\Ra,  then  the  A- 

neighborhood  of  this  point  does  not  intersect  the  edge  contour.  Therefore, 

2 

the  bias  of  the  estimator  over  this  region  is  zero  and  the  variance  is  .  If  we 
bound  the  risk  of  the  points  in  the  7?_^  region  by  1,  we  obtain  the  following 
upper  bound  for  the  risk, 

R(.f,  fMF)  <  ^  +  2A. 

Again  by  optimizing  the  upper  bound  over  A  we  obtain  the  upper  bound  of 
0(cr2/3).  This  completes  the  proof.  □ 


Appendix  B.  Minimax  risk  of  NLM 

Corollary  1  states  the  following  upper  bound  for  the  risk  of  NLM: 
sup  R(f,fN)  =  0(a\  log cr|). 

feHa{C) 
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Iii  this  section  we  prove  that  the  risk  of  NLM  is  lower  bounded  by  Q(cr).  The 
proof  is  similar  to  the  proof  of  Theorem  5  in  [8];  we  consider  the  function 
fh{ti,  t2)  for  h(t\)  =  1/2  and  prove  that  the  bias  of  the  NLM  on  (tl5 t2)  G  Q §_ 

n 

is  @(1)  for  any  choice  of  the  threshold  parameter.  However,  in  the  continuous 
setting  considered  here,  the  steps  are  more  challenging.  Following  [8]  we 
consider  the  semi-oracle  NLM  algorithm  (SNLM).  The  semi-oracle  5-distance 
is  defined  as 


d2s{dY(t1,t2),dY(s1,s2)) 

=  -rzrj  (lK,t2-yfi,Jl 


<,t2(0,0)-yf,,, (0,0)1 


6 

S1,S2  ' 


where 

xtuhtiuh)  =  /  .  .  /(sii  s2)ds\ds2. 

J  (si,s2)e/2'J2 

n 

SNLM  then  estimates  the  weights  according  to 


1  if  d$(dY (ti,  t2),  dY (si,  s2))  <  ^  +  rCT, 
0  otherwise. 


It  is  clear  that  the  distance  estimates  of  the  SNLM  algorithm  are  more  accu¬ 
rate  than  those  of  NLM  algorithm.  Hence  it  outperforms  NLM,  and  a  lower 
bound  that  holds  for  SNLM  will  hold  for  NLM  as  well. 

We  make  the  following  mild  assumptions  on  the  parameters  of  SNLM. 


Al:  The  window  size  5  — >  0  as  a  —¥  0. 

A2:  ^  -  fl(<r2).  Otherwise  Ei^g(dY(t1,t2),dY(si,s2))  — >  oo. 

A3:  n  — ^  oc  as  <r  — >  0.  This  ensures  that,  if  two  points  {t\.t2)  and  (si,s2) 
have  the  same  neighborhoods,  then  wtltt2(ui,  u2)  —  1  with  high  proba- 
bility. 

A4:  If  d(f(ti,t2),f(si,s2))  >  1/4,  then  F(w?l  t2(si,  s2)  =  1)  =  o(cr3). 

Again,  consider  the  function  for  h(ti)  =  1/2  (recall  Figure  2). 

Let  (ti,t2)  G  Qs/n-  For  notational  simplicity  we  use  w(si,s2)  instead  of 
w;^<2(si,s2)- 
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Lemma  9.  If  |sx  —  tx\  >8/2  and  |s'j  —  tx  \>  8/2,  then 

P(tifl>ta(*i,*2)  =  1)  =  F(wflit2(s[,s2)  =  1) 
for  any  t\,t2,sx,  s 2  and  sx . 

The  proof  is  straightforward  and  hence  skipped  here. 

Now  consider  a  point  (tx,  t2)  G  Gs/n- 

Lemma  10.  If  \s  —  tx\  >  8/2,  for  u  <  8/ 4,  then  we  have 

PK'UtM 2  -  u)  =  1)  =  P (w?ut2(s,t2  +  u)  =  1) 


This  lemma  is  a  straightforward  application  of  symmetry,  and  hence  we  skip 
the  proof. 

Lemma  11.  Suppose  that  8  and  ra  satisfy  Al  A/.  Then  we  have 


P 


(/  w(si,s2)dsids2  >  o 2 
Js\Qs/ 2  J 


=  o(a) 


Proof.  Define 

B  =  {(si,s2)  G  S\Qs/ 2  |  w(si,s2)  =  1} 

and  the  event 


Q  =  {A(J3)  >  a2}. 

Using  the  Markov  Inequality  and  Assumption  A4.  it  is  straightforward  to 
show  that 


P(C?)  —  p(f  w(u,v)dudv  >  cr2  J  =  o(a\(S\Qs/2))  —  o(a). 
S\Qs/ 2  J 


On  the  other  hand,  if  A (B)  <  a2,  then 


/  w(u,v)dudv  —  0(a2). 

Js\Qsr 


This  completes  the  proof. 


□ 
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Proposition  1.  Let  (t\,  t2),  (si,  s2)  G  Qs/n-  Then  there  exists  po  >  0  inde¬ 
pendent  of  a  such  that  for  every  tn,S  we  have 

P«i,t2(u>v)  =  !)  >  Po- 

Proof  We  need  to  demonstrate  that  regions  above  and  below  the  edge  con¬ 
tour  are  included  in  the  NLM  algorithm  with  a  constant  non-zero  probability 
Po.  First,  we  use  the  definition  of  the  weights  of  the  NLM  algorithm  to  de¬ 
termine  the  probability  that  w(u.  v )  —  1 


/  _  n2  2  \ 
PK,i2(si,s2)  =  1)  =  P  l  d25(dY{t1,t2),dY(s1,s2)  <  +  ra  \ 


where  S(p  —  z °sfs2(^,p)-  Using  the  Berry- Esseen  Central  Limit  Theorem  for 
independent  non-identically  distributed  random  variables  [42],  we  can  easily 
bound  this  probability  away  from  zero.  For  more  details,  see  Proposition  1 
in  [8].  □ 

We  now  consider  the  weights  for  the  regions  above  and  below  the  edge 
contour  separately.  Therefore,  we  define 


Q\ 

—  {(tl-. 

h) 

|  1/2  <  t2  <  1/2  + A/4}, 

Ql 

—  {(h- 

t2) 

|  1/2  -  A/4  <  t2  <  1/2}, 

^A,S 

—  {(U: 

h) 

|  1/2  <  t2  <  1/2  +  A/4,  0  <  h  <  26}, 

nl,s 

—  {(*1: 

h) 

|  1/2  -  A/4  <  t2  <  1/2,  0  <  ti  <  26}. 

Let 

Pwc  =  P  (w"tut2){u,v)  =  1). 

Note  that  according  to  Lemma  9  this  probability  does  not  depend  on  v. 
Lemma  12.  Ifw(u,v )  denotes  the  weights  used  in  the  NLM  algorithm,  then 


40 


we  have 


P 


P 


>  2e  +  2\ 

>  2e  +  2i 


I  log(cr) 


/  w(u.  v)dudv  —  /  Pu.adu 

JQ\  JQ\  ’  , 

w(u,v)dudv  —  [  puiyd% J  "  °  ^  ^  ^ 

Ql  ->Ql 


u/j 

=  °{ 

V  e  . 

f  \ 

„  i 

=  0 

/ 

\ 

Proof.  The  weights  w(u.  v)  over  the  pixelated  neighborhoods  can  be  rewritten 
in  the  following  way 


/  w(u,  v)dudv  —  /  y  w[u  +  2kS,v)dudv. 

JQ\  k=1 

Applying  the  Hoeffding  inequality,  it  is  straightforward  to  confirm  that 

1 

26 

w(u  +  2 kS,  v)  —  |  >  t  |  <  2e~ 


P 


fc=i 


28 


Now,  defining 


rA  =  | 

[(«,«)  e  Da 

1 

26 

|  ^  n;(u  +  2k8,  v)  - 

«} 

l 

k= 1 

) 

we  see  that 


P(A(f4)  -  A(TX)  >  e)  =  P  (  f  (1  -  I((u,v)  e  r \)dudv  >  e 

(u,v)eiilA  / 


< 


2e- 


where  the  last  step  is  due  to  Markov’s  Inequality.  Define  the  event 
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If  A(f^)  —  A(r'i)  <  e,  then  we  have  that 


w(u,v)dudv  —  /  p(u,a)dudv 
k  JQi 

w(u,  v)dudv  —  /  p(u.  a )dudv 
i  Jr\ 


Q\ 


w(u,v)dudv  —  /  w(u,v)dudv 


rk 


rk 


p(u.  a)dudv  —  I  p(u,  a)dudv 

Q  k 


<  2e  +  2tAS. 


Setting  t  =  y'  |1°f<T|  completes  the  proof.  □ 

Theorem  6.  Suppose  that  6,  ra  and  n  satisfy  Assumptions  Al  Af.  Then 
the  risk  of  SNLM  is 


inf  sup  R(f,fs)  =  n(a). 

■Wn  feHa(C) 

Proof  Let  (fi,£2)  £  f2^n.  We  will  calculate  the  bias  of  the  NLM  estimator 
at  this  point.  We  have 

/  fs  w(u,  v)X  (u,  v)dudv\ 2  /  /  fs  w(u,  v)X  {u,  v)dudv\\  ~ 

\  12  fsw(u,v)dudv  /  —  \  \  fsw(u,v)dudv  )) 
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which  leads  us  to  towards  the  lower  bound 


fs  w(u.  v)X (u,  v)dudv 


fs  w(u,  v)dudv 


>  E 


>  E 


>  E 


^  w(u,  v)X(u,  v)dudv 
( p(u,  v)dudv  —  ‘It  —  tS2  ^ 
fQs/ 4  w(u,  v)X (u,  v)dudv  \ 

Jqs  j)(u.  v)dudv  —  2 t  —  tS2  J 

Jqs/ 4  w(u'  v)f(u,  v)dudv  \ 
fc^  j){u.  v)dudv  —  2e  —  tS2  J 

fn,  w(u,v)dudv  \ 
_ **&/*  _  1  _ 

Jg  ^  p(u,  v)dudv  —  ‘It  —  tS 2  J 

Jq i  p(u ,  v)dudv  +  ‘2t  +  t  \ 

Jq  .  ^  p(u,  v)dudv  —  2 1  —  tS2  J 
fnl  p(u,  v)dudv  +  2 t  +  t 

J  ^6/4 

a  +  f20 1  p(u,  v)dudv  —  2c  —  tS2 


fs  w(u,  v)X ( u ,  v)dudv 
fs  w(u,  v)dudv 

h  I  jFne')  PtcnS) 


-  p(jrc  u  gc) 


-  P(TC  U  Gc) 


-  P(XC  u  Gc 


-  P{XC  u  gc 


Tng)  P(7r n g) 


-p(Fugc). 


(B.l) 


The  minimum  of  the  last  line  is  achieved  when  fQl  p(u.  v)dudv  is  minimized. 

*8/4 

However,  we  have  proved  in  Proposition  1  that  p(u)  >  po  for  (u,  v)  e  Q\/n- 
Therefore,  this  minimizing  integral  is  J2(<j).  When  we  substitute  this  opti¬ 
mum  value  in  the  lower  bound  (B.l),  we  see  that  the  risk  over  this  region 
is  0(1).  Therefore  the  bias  of  the  NLM  over  the  entire  image  is  fi(-)  or, 
equivalently,  f2(<r)  according  to  Assumption  A2.  □ 


Appendix  C.  Proof  of  Theorem  1 

Here  we  focus  on  the  case  of  a  —  2  and  use  the  standard  technique  of 
hypercube  construction  to  establish  the  lower  bound  [12].  Let  <j>  :  [0,1]  — > 
R+  U  {0}  be  a  two  times  differentiable  function  with  ||^|||oo  —  T  0(0)  — 


0(1)  =0.  S|i=0,and 


=  0.  Define 


t,m  —  Cm  2tj)(mt  —  i)  i  =  0, 1, . . . ,  m  —  1. 
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Set  ,/o  =  1{(2<o.5},  and  define  ipi>rn  =  l{t2<*iim(t,)+o.5}  -  /o-  Finally,  define 


/m  —  {/o  +  CiV4,m(/>  /))  Ci  £  {0,  1 } } • 

i=l 


Since  Jrm  c  H2(C).  we  have 

inf  sup  E||/  -  /HI  >  inf  sup  E||/  -  /|||.  (C.l) 

/  //2(C)  /  .Fm 

The  right  hand  side  of  (C.l)  can  be  calculated  more  easily  since  we  can 
restrict  our  attention  to  the  estimators  of  the  form  /o  +  Yff=\  CiV’i.m-  This  is 
due  to  the  fact  that  if  Pjrm  is  the  projection  onto  the  above  affine  space,  for 
every  /  G  P,n  we  have 

II  PfM)  -  /111  <  II  PjrJf)  -  PFmU)  111  <  11/  -  /111- 

Furthermore  for  any  /  G  T),,  we  have 

II/-/II2  =  HC-CII2, 

and  therefore  the  original  problem  reduces  to  one  of  estimating  C-  The  final 
simplification  is  due  to  the  fact  that  by  projecting  the  observations  onto  the 
space  of  signals  we  can  only  improve  the  estimation.  Therefore,  we  reduce 
the  problem  to  the  problem  of  estimating  (/, . . . .  Cm  from  the  observations 
Vi,  •  •  • ,  y„,  given  by 

Vf  -  [  +  /  ^j,mdW  (tU  t2) . 


Define  /?,m)  =  «m,  we  then  have 

yf  =  Km(i  +  wf, 

where  wf  ~  N(0,a2km).  Note  that  nm  =  0(m  3).  Consider  the  problem 
of  estimating  71  from  the  observation  71  +  N (0,  a2 / nm)  and  set  m  to  the 
smallest  integer  for  which  a1  j Krn  <  1.  If  the  risk  of  this  estimator  is  lower 
bounded  by  B,  the  risk  of  the  original  estimator  will  be  BrriK2n  —  ©(a4/3). 
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